Goto

Collaborating Authors

 validation error


Fast Reconstruction of Exact Maxwell Dynamics from Sparse Data

arXiv.org Machine Learning

We introduce FLASH-MAX, a shallow, exact-by-construction neural network architecture for predicting homogeneous electromagnetic fields from sparse pointwise observations. Each hidden neuron represents a separate exact solution to Maxwell's equations, so that the network satisfies the governing equations symbolically by construction and can be trained end-to-end from sparse data within seconds. We prove a universal approximation result showing that this exact model class remains universal on arbitrary domains. FLASH-MAX reaches sub-1% relative validation error from about 1K sparse pointwise observations in seconds, all while maintaining a zero PDE residual, and keeps single-digit errors even for only 100 observations sampled from 3D space. These results suggest that moving governing structure from the loss into the hypothesis class can dramatically improve the trade-off between precision and optimization speed in scientific machine learning.


L2T-DLN: Learning to Teach with Dynamic Loss Network

Neural Information Processing Systems

With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.


DivBO: Diversity-aware CASH for Ensemble Learning

Neural Information Processing Systems

The Combined Algorithm Selection and Hyperparameters optimization (CASH) problem is one of the fundamental problems in Automated Machine Learning (AutoML). Motivated by the success of ensemble learning, recent AutoML systems build post-hoc ensembles to output the final predictions instead of using the best single learner. However, while most CASH methods focus on searching for a single learner with the best performance, they neglect the diversity among base learners (i.e., they may suggest similar configurations to previously evaluated ones), which is also a crucial consideration when building an ensemble. To tackle this issue and further enhance the ensemble performance, we propose DivBO, a diversity-aware framework to inject explicit search of diversity into the CASH problems. In the framework, we propose to use a diversity surrogate to predict the pair-wise diversity of two unseen configurations. Furthermore, we introduce a temporary pool and a weighted acquisition function to guide the search of both performance and diversity based on Bayesian optimization. Empirical results on 15 public datasets show that DivBO achieves the best average ranks (1.82 and 1.73) on both validation and test errors among 10 compared methods, including post-hoc designs in recent AutoML systems and state-of-the-art baselines for ensemble learning on CASH problems.


SupplementaryMaterials

Neural Information Processing Systems

We first prove the direction Z T SI(Z;T) = 0, which is equivalent to prove I(Z;T) = 0 SI(Z;T) = 0. We prove the contrapositive, i.e. rather than show LHS = RHS, we show that RHS = LHS. Now assume that supwi,vj ρ(w i Z i,v j T j) > ϵ for some i,j. Then by setting those elements in w,v unrelated to Z i,T j to zero, and those related to Z i,T j exactlythesameaswi,vj,weknowthatsupw,vρ(w Z,v T) > ϵ. All neural networks are trained by Adam with its default settings and a learning rate η = 0.001. Early stopping is an useful technique for avoiding overfitting, however it needs to be carefully considered when applied to adversarial methods.







AdaptingNeuralArchitecturesBetweenDomains

Neural Information Processing Systems

Neural architecture search (NAS) has demonstrated impressive performance in automatically designing high-performance neural networks. The power ofdeep neural networks is to be unleashed for analyzing a large volume of data (e.g.